---
title: 'parlCymru: a dataset of spoken contributions from the Welsh Parliament'
author: "Daniel Braby"
date: "06/08/2021"
output: pdf_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## Description

parlCymru intends to provide full-text vectors for all spoken contributions in the Senedd Cymru/Welsh Parliament (formerly National Assembly). This present version provides coverage of all recorded speeches of the Fifth Senedd from 2016-05-05 to 2021-05-05. Metadata includes the speaker, their party, gender, electoral district, the title of the debate and the date. Debates are identified by a unique id, as are members. A crosswalk to Wikidata for members is included for future integration with aggregate datasets of legislators and for easily pulling in additional variables. A dataset of Members of the Fifth Senedd is included, also providing Twitter handles. Text is available in both English and Cymraeg, with an additional variable for the language spoken in Parliament. Two files "Corp_Senedd_en_V2.rds" and Corp_Senedd_cy_V2.rds" provides versions compatible with Rauh & Schwalbach's "ParlSpeech V2" dataset for comparative analyses. Full replication materials are available as a single R script. (2021-08-06)


## Schema Info

### Main dataset (available as csv/rds)

x = row number
daily_order_no = Order number by session date
meeting_id = Unique ID of Meeting
date = data of session
contribution_id = Unique ID of Contribution
debate = Name of Debate
debate_id = Unique ID of Debate
order_by_meeting = Session_level order number
member_name = Name of Member
member_role = Given role of Member
wikidataid = WikidataID
ms_type = Constituency or Regional MS
party = Name of Party
constitunecy = Name of Constituency (If Applicable)
region = Name of Region (Elected to/Constituency in)
gender = Gender (M/F)
is_speech = is text vector a speaker identified oral contribution (0/1)
language spoken = Spoken language in Parliament (En/Cy)
contribution_en = Full text of spoken contribution in English
contribution_cy = Full text of spoken contribution in Cymraeg


### ParlSpeech (En/Cy)

date = Session Date
agenda = Focus of debate (ID)
speechnumber = Order number by session date
speaker = Name of Speaker
party = Name of Party
chair = TRUE/FALSE (Presiding Officer, Deputy Presiding Officer)
text = Spoken Contribution
parliament = "Cy-Senedd"
iso3country = "GBR"

- Note: ParlSpeechV2 data contains party.facts.id, not applicable for Wales.


### Members

member_name = Name of Member
member_id = Unique ID of Member
member_role = Given role of Member
wikidataid = WikidataID
party = Name of Party
constitunecy = Name of Constituency (If Applicable)
region = Name of Region (Elected to/Constituency in)
gender = Gender (M/F)
ms_type = Constituency or Regional MS
twitter = Handle for Twitter (Future update: user numeric ID)

### Wikidata: Integrating further Variables

Wikidata contains a wealth of data about politicians, both demographic and political. Using the wikidataid variable featured in the main datasets further variables can be included easily with the R Programming Language and the `tidywikidatar` package from the European Data Journalism Network. Below is an example for accessing and integrating MS' date of birth.

```{r}

require(tidyverse)
require(tidywikidatar)

parlCymru <- read_rds("~/parlCymru_5th_Senedd.csv")


dob <- tw_get_property(id = unique(parlCymru$wikidataid),
                          p = "P569",
                          language = "en")

dob <- dob %>% select(id, value) %>% rename(wikidataid = id,
                                    date_of_birth = value)


parlCymru <- left_join(parlCymru, dob, by = "wikidataid")



```

